Setup

Exported source
import os
import base64

import PIL
import mimetypes
import inspect

from typing import Union
from functools import wraps
from google import genai
from google.genai import types

from fastcore.all import *
from fastcore.docments import *
Exported source
all_model_types = {
    "gemini-2.0-flash": "llm-vertex#gemini-2.0-flash",
    "gemini-2.0-flash-001": "llm-vertex#gemini-2.0-flash",
    "gemini-2.0-pro-exp-02-05": "llm#gemini-2.0-pro",
    "gemini-2.0-flash-lite-preview-02-05": "llm#gemini-2.0-flash-lite",
    "gemini-1.5-flash": "llm-vertex#gemini-1.5-flash",
    "gemini-1.5-pro": "llm-vertex#gemini-1.5-pro",
    "gemini-1.5-pro-002": "llm-vertex#gemini-1.5-pro",
    "gemini-1.5-flash-8b": "llm#gemini-1.5-flash-8b",
    "gemini-2.0-flash-thinking-exp-01-21": "llm-thinking#gemini-2.0-flash-thinking",
    "imagen-3.0-generate-002": "imagen#imagen-3.0"
}

thinking_models = [m for m in all_model_types if "thinking" in all_model_types[m]]

imagen_models = [m for m in all_model_types if "imagen" in all_model_types[m]]

vertex_models = [m for m in all_model_types if "vertex" in all_model_types[m]]

models = [m for m in all_model_types if "llm" in all_model_types[m]]

models
['gemini-2.0-flash',
 'gemini-2.0-flash-001',
 'gemini-2.0-pro-exp-02-05',
 'gemini-2.0-flash-lite-preview-02-05',
 'gemini-1.5-flash',
 'gemini-1.5-pro',
 'gemini-1.5-pro-002',
 'gemini-1.5-flash-8b',
 'gemini-2.0-flash-thinking-exp-01-21']

Gemini has several types of models and only some of those work with VertexAI client. Input wise, essentially all models support images and text, and since Gemini 1.5 all (except gemini-2.0-flash-thinking-exp-01-21) support audio and video input.

Right now only imagen models support non text output, but image and audio outputs are coming to Gemini 2 “soon”.

Main models support function calling, structured output, code executions (with a few exception on the preview models) and all models support streaming and system prompts.

Although it would be nice to provide a uniform library interface with cosette and claudette, it’s probably a bit too complex for the moment.

from IPython.display import Markdown
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY", None)
Markdown("* " + "* ".join([f"`{model.name}`:  {model.description}\n" for model in genai.Client(api_key=GEMINI_API_KEY).models.list()]))
  • models/chat-bison-001: A legacy text-only model optimized for chat conversations
  • models/text-bison-001: A legacy model that understands text and generates text as an output
  • models/embedding-gecko-001: Obtain a distributed representation of a text.
  • models/gemini-1.0-pro-latest: The original Gemini 1.0 Pro model. This model will be discontinued on February 15th, 2025. Move to a newer Gemini version.
  • models/gemini-1.0-pro: The best model for scaling across a wide range of tasks
  • models/gemini-pro: The best model for scaling across a wide range of tasks
  • models/gemini-1.0-pro-001: The original Gemini 1.0 Pro model version that supports tuning. Gemini 1.0 Pro will be discontinued on February 15th, 2025. Move to a newer Gemini version.
  • models/gemini-1.0-pro-vision-latest: The original Gemini 1.0 Pro Vision model version which was optimized for image understanding. Gemini 1.0 Pro Vision was deprecated on July 12, 2024. Move to a newer Gemini version.
  • models/gemini-pro-vision: The original Gemini 1.0 Pro Vision model version which was optimized for image understanding. Gemini 1.0 Pro Vision was deprecated on July 12, 2024. Move to a newer Gemini version.
  • models/gemini-1.5-pro-latest: Alias that points to the most recent production (non-experimental) release of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens.
  • models/gemini-1.5-pro-001: Stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens, released in May of 2024.
  • models/gemini-1.5-pro-002: Stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens, released in September of 2024.
  • models/gemini-1.5-pro: Stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens, released in May of 2024.
  • models/gemini-1.5-flash-latest: Alias that points to the most recent production (non-experimental) release of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks.
  • models/gemini-1.5-flash-001: Stable version of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in May of 2024.
  • models/gemini-1.5-flash-001-tuning: Version of Gemini 1.5 Flash that supports tuning, our fast and versatile multimodal model for scaling across diverse tasks, released in May of 2024.
  • models/gemini-1.5-flash: Alias that points to the most recent stable version of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks.
  • models/gemini-1.5-flash-002: Stable version of Gemini 1.5 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in September of 2024.
  • models/gemini-1.5-flash-8b: Stable version of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model, released in October of 2024.
  • models/gemini-1.5-flash-8b-001: Stable version of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model, released in October of 2024.
  • models/gemini-1.5-flash-8b-latest: Alias that points to the most recent production (non-experimental) release of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model, released in October of 2024.
  • models/gemini-1.5-flash-8b-exp-0827: Experimental release (August 27th, 2024) of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model. Replaced by Gemini-1.5-flash-8b-001 (stable).
  • models/gemini-1.5-flash-8b-exp-0924: Experimental release (September 24th, 2024) of Gemini 1.5 Flash-8B, our smallest and most cost effective Flash model. Replaced by Gemini-1.5-flash-8b-001 (stable).
  • models/gemini-2.0-flash-exp: Gemini 2.0 Flash Experimental
  • models/gemini-2.0-flash: Gemini 2.0 Flash
  • models/gemini-2.0-flash-001: Stable version of Gemini 2.0 Flash, our fast and versatile multimodal model for scaling across diverse tasks, released in January of 2025.
  • models/gemini-2.0-flash-lite-preview: Preview release (February 5th, 2025) of Gemini 2.0 Flash Lite
  • models/gemini-2.0-flash-lite-preview-02-05: Preview release (February 5th, 2025) of Gemini 2.0 Flash Lite
  • models/gemini-2.0-pro-exp: Experimental release (February 5th, 2025) of Gemini 2.0 Pro
  • models/gemini-2.0-pro-exp-02-05: Experimental release (February 5th, 2025) of Gemini 2.0 Pro
  • models/gemini-exp-1206: Experimental release (February 5th, 2025) of Gemini 2.0 Pro
  • models/gemini-2.0-flash-thinking-exp-01-21: Experimental release (January 21st, 2025) of Gemini 2.0 Flash Thinking
  • models/gemini-2.0-flash-thinking-exp: Experimental release (January 21st, 2025) of Gemini 2.0 Flash Thinking
  • models/gemini-2.0-flash-thinking-exp-1219: Gemini 2.0 Flash Thinking Experimental
  • models/learnlm-1.5-pro-experimental: Alias that points to the most recent stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens.
  • models/embedding-001: Obtain a distributed representation of a text.
  • models/text-embedding-004: Obtain a distributed representation of a text.
  • models/aqa: Model trained to return answers to questions that are grounded in provided sources, along with estimating answerable probability.
  • models/imagen-3.0-generate-002: Vertex served Imagen 3.0 002 model

The Genai API exposes way more models than what we have included, but most of those are outdated, the list is not properly maintained and not all models behave in the same way. Although all of them will still be available, it’s is fine to just restrict to a few select ones.

Genai SDK

c = genai.Client(api_key=GEMINI_API_KEY)

This how the Gemini SDK gives access to the API. The client itself, in particular has a number of subclients/methods that give access to the different endpoint and functionalities. The main ones we are interested in are models, that is the main interface with all the models, and chat which essentially wraps the former with a few convenience functionalities for handling message history.

model = models[0]
model
'gemini-2.0-flash'
r = c.models.generate_content(model=model, contents="Hi Gemini! Are you ready to work?")
r
GenerateContentResponse(candidates=[Candidate(content=Content(parts=[Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text='Yes, I am ready to work! What can I do for you?\n')], role='model'), citation_metadata=None, finish_message=None, token_count=None, avg_logprobs=-0.36470311880111694, finish_reason=<FinishReason.STOP: 'STOP'>, grounding_metadata=None, index=None, logprobs_result=None, safety_ratings=None)], model_version='gemini-2.0-flash', prompt_feedback=None, usage_metadata=GenerateContentResponseUsageMetadata(cached_content_token_count=None, candidates_token_count=16, prompt_token_count=9, total_token_count=25), automatic_function_calling_history=[], parsed=None)
r.to_json_dict()
{'candidates': [{'content': {'parts': [{'text': 'Yes, I am ready to work! What can I do for you?\n'}],
    'role': 'model'},
   'avg_logprobs': -0.36470311880111694,
   'finish_reason': 'STOP'}],
 'model_version': 'gemini-2.0-flash',
 'usage_metadata': {'candidates_token_count': 16,
  'prompt_token_count': 9,
  'total_token_count': 25},
 'automatic_function_calling_history': []}
print(r.text)
Yes, I am ready to work! What can I do for you?

In typical Google fashion (they really like their protobufs), the response is a nested mess of pydantic models. Luckily they all have a few convenience methods to make everything a bit more accessible.

help(genai._common.BaseModel)
Help on class BaseModel in module google.genai._common:

class BaseModel(pydantic.main.BaseModel)
 |  BaseModel() -> None
 |
 |  Method resolution order:
 |      BaseModel
 |      pydantic.main.BaseModel
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  to_json_dict(self) -> dict[str, object]
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __weakref__
 |      list of weak references to the object
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __abstractmethods__ = frozenset()
 |
 |  __annotations__ = {}
 |
 |  __class_vars__ = set()
 |
 |  __private_attributes__ = {}
 |
 |  __pydantic_complete__ = True
 |
 |  __pydantic_computed_fields__ = {}
 |
 |  __pydantic_core_schema__ = {'cls': <class 'google.genai._common.BaseMo...
 |
 |  __pydantic_custom_init__ = False
 |
 |  __pydantic_decorators__ = DecoratorInfos(validators={}, field_validato...
 |
 |  __pydantic_fields__ = {}
 |
 |  __pydantic_generic_metadata__ = {'args': (), 'origin': None, 'paramete...
 |
 |  __pydantic_parent_namespace__ = None
 |
 |  __pydantic_post_init__ = None
 |
 |  __pydantic_serializer__ = SchemaSerializer(serializer=Model(
 |      Model...
 |
 |  __pydantic_validator__ = SchemaValidator(title="BaseModel", validator=...
 |
 |  __signature__ = <Signature () -> None>
 |
 |  model_config = {'alias_generator': <function to_camel>, 'arbitrary_typ...
 |
 |  ----------------------------------------------------------------------
 |  Methods inherited from pydantic.main.BaseModel:
 |
 |  __copy__(self) -> 'Self'
 |      Returns a shallow copy of the model.
 |
 |  __deepcopy__(self, memo: 'dict[int, Any] | None' = None) -> 'Self'
 |      Returns a deep copy of the model.
 |
 |  __delattr__(self, item: 'str') -> 'Any'
 |      Implement delattr(self, name).
 |
 |  __eq__(self, other: 'Any') -> 'bool'
 |      Return self==value.
 |
 |  __getattr__(self, item: 'str') -> 'Any'
 |
 |  __getstate__(self) -> 'dict[Any, Any]'
 |      Helper for pickle.
 |
 |  __init__(self, /, **data: 'Any') -> 'None'
 |      Create a new model by parsing and validating input data from keyword arguments.
 |
 |      Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
 |      validated to form a valid model.
 |
 |      `self` is explicitly positional-only to allow `self` as a field name.
 |
 |  __iter__(self) -> 'TupleGenerator'
 |      So `dict(model)` works.
 |
 |  __pretty__(self, fmt: 'typing.Callable[[Any], Any]', **kwargs: 'Any') -> 'typing.Generator[Any, None, None]' from pydantic._internal._repr.Representation
 |      Used by devtools (https://python-devtools.helpmanual.io/) to pretty print objects.
 |
 |  __replace__(self, **changes: 'Any') -> 'Self'
 |      # Because we make use of `@dataclass_transform()`, `__replace__` is already synthesized by
 |      # type checkers, so we define the implementation in this `if not TYPE_CHECKING:` block:
 |
 |  __repr__(self) -> 'str'
 |      Return repr(self).
 |
 |  __repr_args__(self) -> '_repr.ReprArgs'
 |
 |  __repr_name__(self) -> 'str' from pydantic._internal._repr.Representation
 |      Name of the instance's class, used in __repr__.
 |
 |  __repr_recursion__(self, object: 'Any') -> 'str' from pydantic._internal._repr.Representation
 |      Returns the string representation of a recursive object.
 |
 |  __repr_str__(self, join_str: 'str') -> 'str' from pydantic._internal._repr.Representation
 |
 |  __rich_repr__(self) -> 'RichReprResult' from pydantic._internal._repr.Representation
 |      Used by Rich (https://rich.readthedocs.io/en/stable/pretty.html) to pretty print objects.
 |
 |  __setattr__(self, name: 'str', value: 'Any') -> 'None'
 |      Implement setattr(self, name, value).
 |
 |  __setstate__(self, state: 'dict[Any, Any]') -> 'None'
 |
 |  __str__(self) -> 'str'
 |      Return str(self).
 |
 |  copy(self, *, include: 'AbstractSetIntStr | MappingIntStrAny | None' = None, exclude: 'AbstractSetIntStr | MappingIntStrAny | None' = None, update: 'Dict[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'
 |      Returns a copy of the model.
 |
 |      !!! warning "Deprecated"
 |          This method is now deprecated; use `model_copy` instead.
 |
 |      If you need `include` or `exclude`, use:
 |
 |      ```python {test="skip" lint="skip"}
 |      data = self.model_dump(include=include, exclude=exclude, round_trip=True)
 |      data = {**data, **(update or {})}
 |      copied = self.model_validate(data)
 |      ```
 |
 |      Args:
 |          include: Optional set or mapping specifying which fields to include in the copied model.
 |          exclude: Optional set or mapping specifying which fields to exclude in the copied model.
 |          update: Optional dictionary of field-value pairs to override field values in the copied model.
 |          deep: If True, the values of fields that are Pydantic models will be deep-copied.
 |
 |      Returns:
 |          A copy of the model with included, excluded and updated fields as specified.
 |
 |  dict(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False) -> 'Dict[str, Any]'
 |
 |  json(self, *, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, encoder: 'Callable[[Any], Any] | None' = PydanticUndefined, models_as_dict: 'bool' = PydanticUndefined, **dumps_kwargs: 'Any') -> 'str'
 |
 |  model_copy(self, *, update: 'Mapping[str, Any] | None' = None, deep: 'bool' = False) -> 'Self'
 |      Usage docs: https://docs.pydantic.dev/2.10/concepts/serialization/#model_copy
 |
 |      Returns a copy of the model.
 |
 |      Args:
 |          update: Values to change/add in the new model. Note: the data is not validated
 |              before creating the new model. You should trust this data.
 |          deep: Set to `True` to make a deep copy of the model.
 |
 |      Returns:
 |          New model instance.
 |
 |  model_dump(self, *, mode: "Literal['json', 'python'] | str" = 'python', include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, serialize_as_any: 'bool' = False) -> 'dict[str, Any]'
 |      Usage docs: https://docs.pydantic.dev/2.10/concepts/serialization/#modelmodel_dump
 |
 |      Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
 |
 |      Args:
 |          mode: The mode in which `to_python` should run.
 |              If mode is 'json', the output will only contain JSON serializable types.
 |              If mode is 'python', the output may contain non-JSON-serializable Python objects.
 |          include: A set of fields to include in the output.
 |          exclude: A set of fields to exclude from the output.
 |          context: Additional context to pass to the serializer.
 |          by_alias: Whether to use the field's alias in the dictionary key if defined.
 |          exclude_unset: Whether to exclude fields that have not been explicitly set.
 |          exclude_defaults: Whether to exclude fields that are set to their default value.
 |          exclude_none: Whether to exclude fields that have a value of `None`.
 |          round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T].
 |          warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors,
 |              "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError].
 |          serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.
 |
 |      Returns:
 |          A dictionary representation of the model.
 |
 |  model_dump_json(self, *, indent: 'int | None' = None, include: 'IncEx | None' = None, exclude: 'IncEx | None' = None, context: 'Any | None' = None, by_alias: 'bool' = False, exclude_unset: 'bool' = False, exclude_defaults: 'bool' = False, exclude_none: 'bool' = False, round_trip: 'bool' = False, warnings: "bool | Literal['none', 'warn', 'error']" = True, serialize_as_any: 'bool' = False) -> 'str'
 |      Usage docs: https://docs.pydantic.dev/2.10/concepts/serialization/#modelmodel_dump_json
 |
 |      Generates a JSON representation of the model using Pydantic's `to_json` method.
 |
 |      Args:
 |          indent: Indentation to use in the JSON output. If None is passed, the output will be compact.
 |          include: Field(s) to include in the JSON output.
 |          exclude: Field(s) to exclude from the JSON output.
 |          context: Additional context to pass to the serializer.
 |          by_alias: Whether to serialize using field aliases.
 |          exclude_unset: Whether to exclude fields that have not been explicitly set.
 |          exclude_defaults: Whether to exclude fields that are set to their default value.
 |          exclude_none: Whether to exclude fields that have a value of `None`.
 |          round_trip: If True, dumped values should be valid as input for non-idempotent types such as Json[T].
 |          warnings: How to handle serialization errors. False/"none" ignores them, True/"warn" logs errors,
 |              "error" raises a [`PydanticSerializationError`][pydantic_core.PydanticSerializationError].
 |          serialize_as_any: Whether to serialize fields with duck-typing serialization behavior.
 |
 |      Returns:
 |          A JSON string representation of the model.
 |
 |  model_post_init(self, _BaseModel__context: 'Any') -> 'None'
 |      Override this method to perform additional initialization after `__init__` and `model_construct`.
 |      This is useful if you want to do some validation that requires the entire model to be initialized.
 |
 |  ----------------------------------------------------------------------
 |  Class methods inherited from pydantic.main.BaseModel:
 |
 |  __class_getitem__(typevar_values: 'type[Any] | tuple[type[Any], ...]') -> 'type[BaseModel] | _forward_ref.PydanticRecursiveRef'
 |
 |  __get_pydantic_core_schema__(source: 'type[BaseModel]', handler: 'GetCoreSchemaHandler', /) -> 'CoreSchema'
 |      Hook into generating the model's CoreSchema.
 |
 |      Args:
 |          source: The class we are generating a schema for.
 |              This will generally be the same as the `cls` argument if this is a classmethod.
 |          handler: A callable that calls into Pydantic's internal CoreSchema generation logic.
 |
 |      Returns:
 |          A `pydantic-core` `CoreSchema`.
 |
 |  __get_pydantic_json_schema__(core_schema: 'CoreSchema', handler: 'GetJsonSchemaHandler', /) -> 'JsonSchemaValue'
 |      Hook into generating the model's JSON schema.
 |
 |      Args:
 |          core_schema: A `pydantic-core` CoreSchema.
 |              You can ignore this argument and call the handler with a new CoreSchema,
 |              wrap this CoreSchema (`{'type': 'nullable', 'schema': current_schema}`),
 |              or just call the handler with the original schema.
 |          handler: Call into Pydantic's internal JSON schema generation.
 |              This will raise a `pydantic.errors.PydanticInvalidForJsonSchema` if JSON schema
 |              generation fails.
 |              Since this gets called by `BaseModel.model_json_schema` you can override the
 |              `schema_generator` argument to that function to change JSON schema generation globally
 |              for a type.
 |
 |      Returns:
 |          A JSON schema, as a Python object.
 |
 |  __pydantic_init_subclass__(**kwargs: 'Any') -> 'None'
 |      This is intended to behave just like `__init_subclass__`, but is called by `ModelMetaclass`
 |      only after the class is actually fully initialized. In particular, attributes like `model_fields` will
 |      be present when this is called.
 |
 |      This is necessary because `__init_subclass__` will always be called by `type.__new__`,
 |      and it would require a prohibitively large refactor to the `ModelMetaclass` to ensure that
 |      `type.__new__` was called in such a manner that the class would already be sufficiently initialized.
 |
 |      This will receive the same `kwargs` that would be passed to the standard `__init_subclass__`, namely,
 |      any kwargs passed to the class definition that aren't used internally by pydantic.
 |
 |      Args:
 |          **kwargs: Any keyword arguments passed to the class definition that aren't used internally
 |              by pydantic.
 |
 |  construct(_fields_set: 'set[str] | None' = None, **values: 'Any') -> 'Self'
 |
 |  from_orm(obj: 'Any') -> 'Self'
 |
 |  model_construct(_fields_set: 'set[str] | None' = None, **values: 'Any') -> 'Self'
 |      Creates a new instance of the `Model` class with validated data.
 |
 |      Creates a new model setting `__dict__` and `__pydantic_fields_set__` from trusted or pre-validated data.
 |      Default values are respected, but no other validation is performed.
 |
 |      !!! note
 |          `model_construct()` generally respects the `model_config.extra` setting on the provided model.
 |          That is, if `model_config.extra == 'allow'`, then all extra passed values are added to the model instance's `__dict__`
 |          and `__pydantic_extra__` fields. If `model_config.extra == 'ignore'` (the default), then all extra passed values are ignored.
 |          Because no validation is performed with a call to `model_construct()`, having `model_config.extra == 'forbid'` does not result in
 |          an error if extra values are passed, but they will be ignored.
 |
 |      Args:
 |          _fields_set: A set of field names that were originally explicitly set during instantiation. If provided,
 |              this is directly used for the [`model_fields_set`][pydantic.BaseModel.model_fields_set] attribute.
 |              Otherwise, the field names from the `values` argument will be used.
 |          values: Trusted or pre-validated data dictionary.
 |
 |      Returns:
 |          A new instance of the `Model` class with validated data.
 |
 |  model_json_schema(by_alias: 'bool' = True, ref_template: 'str' = '#/$defs/{model}', schema_generator: 'type[GenerateJsonSchema]' = <class 'pydantic.json_schema.GenerateJsonSchema'>, mode: 'JsonSchemaMode' = 'validation') -> 'dict[str, Any]'
 |      Generates a JSON schema for a model class.
 |
 |      Args:
 |          by_alias: Whether to use attribute aliases or not.
 |          ref_template: The reference template.
 |          schema_generator: To override the logic used to generate the JSON schema, as a subclass of
 |              `GenerateJsonSchema` with your desired modifications
 |          mode: The mode in which to generate the schema.
 |
 |      Returns:
 |          The JSON schema for the given model class.
 |
 |  model_parametrized_name(params: 'tuple[type[Any], ...]') -> 'str'
 |      Compute the class name for parametrizations of generic classes.
 |
 |      This method can be overridden to achieve a custom naming scheme for generic BaseModels.
 |
 |      Args:
 |          params: Tuple of types of the class. Given a generic class
 |              `Model` with 2 type variables and a concrete model `Model[str, int]`,
 |              the value `(str, int)` would be passed to `params`.
 |
 |      Returns:
 |          String representing the new class where `params` are passed to `cls` as type variables.
 |
 |      Raises:
 |          TypeError: Raised when trying to generate concrete names for non-generic models.
 |
 |  model_rebuild(*, force: 'bool' = False, raise_errors: 'bool' = True, _parent_namespace_depth: 'int' = 2, _types_namespace: 'MappingNamespace | None' = None) -> 'bool | None'
 |      Try to rebuild the pydantic-core schema for the model.
 |
 |      This may be necessary when one of the annotations is a ForwardRef which could not be resolved during
 |      the initial attempt to build the schema, and automatic rebuilding fails.
 |
 |      Args:
 |          force: Whether to force the rebuilding of the model schema, defaults to `False`.
 |          raise_errors: Whether to raise errors, defaults to `True`.
 |          _parent_namespace_depth: The depth level of the parent namespace, defaults to 2.
 |          _types_namespace: The types namespace, defaults to `None`.
 |
 |      Returns:
 |          Returns `None` if the schema is already "complete" and rebuilding was not required.
 |          If rebuilding _was_ required, returns `True` if rebuilding was successful, otherwise `False`.
 |
 |  model_validate(obj: 'Any', *, strict: 'bool | None' = None, from_attributes: 'bool | None' = None, context: 'Any | None' = None) -> 'Self'
 |      Validate a pydantic model instance.
 |
 |      Args:
 |          obj: The object to validate.
 |          strict: Whether to enforce types strictly.
 |          from_attributes: Whether to extract data from object attributes.
 |          context: Additional context to pass to the validator.
 |
 |      Raises:
 |          ValidationError: If the object could not be validated.
 |
 |      Returns:
 |          The validated model instance.
 |
 |  model_validate_json(json_data: 'str | bytes | bytearray', *, strict: 'bool | None' = None, context: 'Any | None' = None) -> 'Self'
 |      Usage docs: https://docs.pydantic.dev/2.10/concepts/json/#json-parsing
 |
 |      Validate the given JSON data against the Pydantic model.
 |
 |      Args:
 |          json_data: The JSON data to validate.
 |          strict: Whether to enforce types strictly.
 |          context: Extra variables to pass to the validator.
 |
 |      Returns:
 |          The validated Pydantic model.
 |
 |      Raises:
 |          ValidationError: If `json_data` is not a JSON string or the object could not be validated.
 |
 |  model_validate_strings(obj: 'Any', *, strict: 'bool | None' = None, context: 'Any | None' = None) -> 'Self'
 |      Validate the given object with string data against the Pydantic model.
 |
 |      Args:
 |          obj: The object containing string data to validate.
 |          strict: Whether to enforce types strictly.
 |          context: Extra variables to pass to the validator.
 |
 |      Returns:
 |          The validated Pydantic model.
 |
 |  parse_file(path: 'str | Path', *, content_type: 'str | None' = None, encoding: 'str' = 'utf8', proto: 'DeprecatedParseProtocol | None' = None, allow_pickle: 'bool' = False) -> 'Self'
 |
 |  parse_obj(obj: 'Any') -> 'Self'
 |
 |  parse_raw(b: 'str | bytes', *, content_type: 'str | None' = None, encoding: 'str' = 'utf8', proto: 'DeprecatedParseProtocol | None' = None, allow_pickle: 'bool' = False) -> 'Self'
 |
 |  schema(by_alias: 'bool' = True, ref_template: 'str' = '#/$defs/{model}') -> 'Dict[str, Any]'
 |
 |  schema_json(*, by_alias: 'bool' = True, ref_template: 'str' = '#/$defs/{model}', **dumps_kwargs: 'Any') -> 'str'
 |
 |  update_forward_refs(**localns: 'Any') -> 'None'
 |
 |  validate(value: 'Any') -> 'Self'
 |
 |  ----------------------------------------------------------------------
 |  Readonly properties inherited from pydantic.main.BaseModel:
 |
 |  __fields_set__
 |
 |  model_computed_fields
 |      Get metadata about the computed fields defined on the model.
 |
 |      Deprecation warning: you should be getting this information from the model class, not from an instance.
 |      In V3, this property will be removed from the `BaseModel` class.
 |
 |      Returns:
 |          A mapping of computed field names to [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
 |
 |  model_extra
 |      Get extra fields set during validation.
 |
 |      Returns:
 |          A dictionary of extra fields, or `None` if `config.extra` is not set to `"allow"`.
 |
 |  model_fields
 |      Get metadata about the fields defined on the model.
 |
 |      Deprecation warning: you should be getting this information from the model class, not from an instance.
 |      In V3, this property will be removed from the `BaseModel` class.
 |
 |      Returns:
 |          A mapping of field names to [`FieldInfo`][pydantic.fields.FieldInfo] objects.
 |
 |  model_fields_set
 |      Returns the set of fields that have been explicitly set on this model instance.
 |
 |      Returns:
 |          A set of strings representing the fields that have been set,
 |              i.e. that were not filled from defaults.
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pydantic.main.BaseModel:
 |
 |  __dict__
 |      dictionary for instance variables
 |
 |  __pydantic_extra__
 |
 |  __pydantic_fields_set__
 |
 |  __pydantic_private__
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from pydantic.main.BaseModel:
 |
 |  __hash__ = None
 |
 |  __pydantic_root_model__ = False

All the models in genai.types that are then used to cast the interaction back and forth with the API are subclasses of the genai._common.BaseModel. Although this being a “private” module make it less than ideal, it will be useful to quickly monkey patch a better representation of the output.

Stopping sequences, system prompts, and streaming

types.GenerateContentConfigDict??
Init signature: types.GenerateContentConfigDict(self, /, *args, **kwargs)
Source:        
class GenerateContentConfigDict(TypedDict, total=False):
  """Optional model configuration parameters.
  For more information, see `Content generation parameters
  <https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/content-generation-parameters>`_.
  """
  http_options: Optional[HttpOptionsDict]
  """Used to override HTTP request options."""
  system_instruction: Optional[ContentUnionDict]
  """Instructions for the model to steer it toward better performance.
      For example, "Answer as concisely as possible" or "Don't use technical
      terms in your response".
      """
  temperature: Optional[float]
  """Value that controls the degree of randomness in token selection.
      Lower temperatures are good for prompts that require a less open-ended or
      creative response, while higher temperatures can lead to more diverse or
      creative results.
      """
  top_p: Optional[float]
  """Tokens are selected from the most to least probable until the sum
      of their probabilities equals this value. Use a lower value for less
      random responses and a higher value for more random responses.
      """
  top_k: Optional[float]
  """For each token selection step, the ``top_k`` tokens with the
      highest probabilities are sampled. Then tokens are further filtered based
      on ``top_p`` with the final token selected using temperature sampling. Use
      a lower number for less random responses and a higher number for more
      random responses.
      """
  candidate_count: Optional[int]
  """Number of response variations to return.
      """
  max_output_tokens: Optional[int]
  """Maximum number of tokens that can be generated in the response.
      """
  stop_sequences: Optional[list[str]]
  """List of strings that tells the model to stop generating text if one
      of the strings is encountered in the response.
      """
  response_logprobs: Optional[bool]
  """Whether to return the log probabilities of the tokens that were
      chosen by the model at each step.
      """
  logprobs: Optional[int]
  """Number of top candidate tokens to return the log probabilities for
      at each generation step.
      """
  presence_penalty: Optional[float]
  """Positive values penalize tokens that already appear in the
      generated text, increasing the probability of generating more diverse
      content.
      """
  frequency_penalty: Optional[float]
  """Positive values penalize tokens that repeatedly appear in the
      generated text, increasing the probability of generating more diverse
      content.
      """
  seed: Optional[int]
  """When ``seed`` is fixed to a specific number, the model makes a best
      effort to provide the same response for repeated requests. By default, a
      random number is used.
      """
  response_mime_type: Optional[str]
  """Output response media type of the generated candidate text.
      """
  response_schema: Optional[SchemaUnionDict]
  """Schema that the generated candidate text must adhere to.
      """
  routing_config: Optional[GenerationConfigRoutingConfigDict]
  """Configuration for model router requests.
      """
  safety_settings: Optional[list[SafetySettingDict]]
  """Safety settings in the request to block unsafe content in the
      response.
      """
  tools: Optional[ToolListUnionDict]
  """Code that enables the system to interact with external systems to
      perform an action outside of the knowledge and scope of the model.
      """
  tool_config: Optional[ToolConfigDict]
  """Associates model output to a specific function call.
      """
  labels: Optional[dict[str, str]]
  """Labels with user-defined metadata to break down billed charges."""
  cached_content: Optional[str]
  """Resource name of a context cache that can be used in subsequent
      requests.
      """
  response_modalities: Optional[list[str]]
  """The requested modalities of the response. Represents the set of
      modalities that the model can return.
      """
  media_resolution: Optional[MediaResolution]
  """If specified, the media resolution specified will be used.
    """
  speech_config: Optional[SpeechConfigUnionDict]
  """The speech generation configuration.
      """
  audio_timestamp: Optional[bool]
  """If enabled, audio timestamp will be included in the request to the
       model.
      """
  automatic_function_calling: Optional[AutomaticFunctionCallingConfigDict]
  """The configuration for automatic function calling.
      """
  thinking_config: Optional[ThinkingConfigDict]
  """The thinking features configuration.
      """
File:           ~/mambaforge/envs/prototypes/lib/python3.12/site-packages/google/genai/types.py
Type:           _TypedDictMeta
Subclasses:     

All the generation paramters can be passed as a dictionary. The available parameters are defined by the GenerateContentConfigDict model (which is just a snake case/dictionary conversion of GenerateContentConfig). This include things like temperature and top_k/top_p value as well as stopping sequences and system prompt (which is actually passed at each generation)

sr = c.models.generate_content(model=model, 
                               contents="Count from 1 to 5 and add a write a different animal after each number",
                               config={"stop_sequences": ["4"]})
print(sr.text)
Okay, here we go!

1 - Cat
2 - Dog
3 - Bird
spr = c.models.generate_content(model=model, 
                               contents="Count from 1 to 5 and add a write a different animal after each number",
                               config={"system_instruction": "Always talk in Spanish"})
print(spr.text)
¡Por supuesto! Aquí tienes:

1.  Uno... Perro
2.  Dos... Gato
3.  Tres... Elefante
4.  Cuatro... León
5.  Cinco... Ballena
for chunk in c.models.generate_content_stream(model=model, contents="Write a small poem about the hardships of being a cocker spaniel"):
    print(chunk.text, end='')
With ears so long, a tripping hazard,
Through muddy fields, a muddy blaggard.
A feathered tail, a joyful wag,
Yet burrs collect, a constant snag.

My soulful eyes, they plead and yearn,
For walks and treats, a love to earn.
But oh, the groomer, shears so bright,
A cocker's life, a fluffy fight.
type(chunk)
google.genai.types.GenerateContentResponse

Streaming content is handled with a separate method of models but everything else stays the same.

Image generation

mr = c.models.generate_images(
    model = imagen_models[0],
    prompt = "A roman mosaic of a boxing match between a feathered dinosaur and a cocker spaniel,\
refereed by Sasquatch, in front of a crowd of cheering otters.",
    config = {"number_of_images": 2}
)
type(mr)
google.genai.types.GenerateImagesResponse

Currently the only multimedia output supported is image generation via imagen models (although audio and image generation with Gemini 2.0 is currently being tested in restricted availability). This uses a separate method, and returns a different type of response. The possible options are defined in the pydantic model but not all of them are actually available (for example, the enhance_prompt option does not work with the Gemini API).

for genim in  mr.generated_images:
    genim.image.show()

mr.generated_images[0].image.save('match.png')

As usual, the response is a convoluted nested mess of pydantic models. The images are returned as a wrapper of a PIL.Image which gives access to a few convenience function for displaying and saving. Right now jpeg is the only possible output format.

Formatting Output

r.text
'Yes, I am ready to work! What can I do for you?\n'
r.to_json_dict()
{'candidates': [{'content': {'parts': [{'text': 'Yes, I am ready to work! What can I do for you?\n'}],
    'role': 'model'},
   'avg_logprobs': -0.36470311880111694,
   'finish_reason': 'STOP'}],
 'model_version': 'gemini-2.0-flash',
 'usage_metadata': {'candidates_token_count': 16,
  'prompt_token_count': 9,
  'total_token_count': 25},
 'automatic_function_calling_history': []}
r.model_fields_set
{'automatic_function_calling_history',
 'candidates',
 'model_version',
 'usage_metadata'}
type(r.usage_metadata), type(r.candidates), type(r.model_version)
(google.genai.types.GenerateContentResponseUsageMetadata, list, str)

We want to recursively navigate the nested tree of submodels and attributes. Each model in genai.types has three types of attributes:

  1. a genai.types module (like usage_metadata)
  2. a list (like candidates)
  3. a primitive type (like model_version)

We could extract the attributes of a model from the class itself, but to avoid cluttering the output, we can use the model_fields_set property to avoid attributes that are not set in the model instance.


source

get_repr

 get_repr (m, lab='')

Recurisvely fetch the markdown representation of genai.types fields, wrapping lists into <details> blocks

Exported source
def get_repr(m, lab=""):
    """Recurisvely fetch the markdown representation of genai.types fields, wrapping lists into `<details>` blocks"""
    if hasattr(m, '_repr_markdown_'): return m._repr_markdown_()
    if is_listy(m): return "\n".join([f"<details open='true'><summary>{lab}[{i}]</summary>{get_repr(li)}</details>" for i, li in enumerate(m)])
    if isinstance(m, dict): return "<ul>" + "\n".join([f"<li><b>{i}</b>: {get_repr(li, i)}</li>" for i, li in m.items()]) + "</ul>"
    if isinstance(m, bytes): return m[:10] + b'...'
    return str(m)
Markdown(get_repr([["A", "B", "C"], "b", "c", {"x": 2, "y": {"as": "sa"}}], "ex"))
ex[0]
[0] A
[1] B
[2] C
ex[1] b
ex[2] c
ex[3]
  • x: 2
  • y:
    • as: sa

The basic recursive loop is in place: we can handle genai.types (via their _repr_markdown_ methods), strings and lists. The handling of bytes is to avoid polluting (or crashing) the representation with a huge list of charachters in case of a multimodal response.


source

det_repr

 det_repr (m)
Exported source
def det_repr(m): return "<ul>" + "".join(f"<li><code>{d}</code>: {get_repr(getattr(m, d), d)}</li>" for d in m.model_fields_set) + "</ul>"
Markdown(det_repr(r.usage_metadata))
  • candidates_token_count: 16
  • total_token_count: 25
  • prompt_token_count: 9

Wrapping the details in a list makes for a cleaner and more readable look.

Exported source
@patch
def _repr_markdown_(self: genai._common.BaseModel):
    return det_repr(self)
r
  • usage_metadata:
    • candidates_token_count: 16
    • total_token_count: 25
    • prompt_token_count: 9
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.36470311880111694
    • content:
      • parts:
        parts[0]
        • text: Yes, I am ready to work! What can I do for you?
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:

By using fastcore’s patch on the _common.BaseModel we have made sure that all the models in genai.types have a nice consistent markdown representation. We can now refine the representation for some of the types.

Response representation

Exported source
@patch
def _repr_markdown_(self: genai.types.GenerateContentResponse):
    c = None
    try:
        c = self.text.replace("\n", "<br />")
    except ValueError as e:
        calls = (f"<code>{call.name}({', '.join([f'{a}={v}' for a, v in call.args.items()])})</code>" for call in self.function_calls)
        calls_repr = '\n'.join(f'<li>{c}</li>' for c in calls)
        c = f"<ul>{calls_repr}</ul>"
    dets = det_repr(self)
    return f"""{c}\n<details>{dets}</details>"""
r
Yes, I am ready to work! What can I do for you?
  • usage_metadata: Cached: 0; In: 9; Out: 16; Total: 25
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.36470311880111694
    • content:
      • parts:
        parts[0]
        • text: Yes, I am ready to work! What can I do for you?
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:

Multimodal response representation

mr
  • generated_images:
    generated_images[0]
    • image:
      • mime_type: image/png
      • image_bytes: b’89PNG1a…’
    generated_images[1]
    • image:
      • mime_type: image/png
      • image_bytes: b’89PNG1a…’

Image.html

 Image.html ()
Exported source
@patch(as_prop=True)
def html(self: types.Image):
    b64 = base64.b64encode(self.image_bytes).decode("utf-8")
    return f'<img src="data:{self.mime_type};base64,{b64}" />'


@patch
def _repr_markdown_(self: types.Image):
    return f"""<div style="width: 100px; height: auto;">{self.html}</div>

<details>
{det_repr(self)}
</details>"""
mr.generated_images[0].image
  • mime_type: image/png
  • image_bytes: b’89PNG1a…’

GenerateImagesResponse.img

 GenerateImagesResponse.img ()
Exported source
@patch(as_prop=True)
def img(self: types.GenerateImagesResponse):
    return self.generated_images[0].image._pil_image


@patch
def _repr_markdown_(self: types.GenerateImagesResponse):
    N = len(self.generated_images)
    cols = min(N, 4)
    rows = math.ceil(N / 4)
   
    ims = "".join([f"""<div style="display: grid; 
                    width: 100%; 
                    max-width: 1000px; 
                    height: auto;
                    margin: 0 auto; 
                    grid-template-columns: {rows}fr;
                    grid-template-rows: 1ft;
                   ">{gim.image.html}</div>""" 
                   for gim in self.generated_images])
    
    
    
    i = f"""
<div style="display: grid; 
                gap: 4px; 
                width: 100%;
                height: auto;
                max-width: 1000px; 
                margin: 0 auto; 
                padding: 4px;
                grid-template-columns: repeat({cols}, 1fr);
                grid-template-rows: repeat({rows}, 1fr);
                ">

{ims}

</div>
    """
    return f"""{i}

<details>
{det_repr(self)}
</details>
"""
mr.img

mr
  • generated_images:
    generated_images[0]
    • image:
      • mime_type: image/png
      • image_bytes: b’89PNG1a…’
    generated_images[1]
    • image:
      • mime_type: image/png
      • image_bytes: b’89PNG1a…’

Most of the time we probably return a single image and .img property gives a convenient way of fetching the PIL image. The markdown representation is not ideal, but turns out that sizing a variable number of images into a grid can be very tricky, so we’ll leave it at that for the moment.

Usage & query costs


source

usage

 usage (inp=0, out=0, cached=0)

A quicker and simpler constructor for the Usage Metadata model

Type Default Details
inp int 0 Number of input tokens (excluding cached)
out int 0 Number of output tokens
cached int 0 Number of cached tokens
Exported source
def usage(inp=0,     # Number of input tokens (excluding cached)
          out=0,     # Number of output tokens
          cached=0): # Number of cached tokens
    """A quicker and simpler constructor for the Usage Metadata model"""
    return types.GenerateContentResponseUsageMetadata(cached_content_token_count=cached, 
                                                      candidates_token_count=out, 
                                                      prompt_token_count=inp + cached, 
                                                      total_token_count=inp + out + cached)

Unusually, prompt_token_count includes both cached and uncached prompt tokens.

usage(0, 32, 12)
  • candidates_token_count: 32
  • total_token_count: 44
  • prompt_token_count: 12
  • cached_content_token_count: 12

As usual, constructor for models are very verbose, so we build a simpler version.


GenerateContentResponseUsageMetadata.total

 GenerateContentResponseUsageMetadata.total ()
Exported source
@patch(as_prop=True)
def cached(self: types.GenerateContentResponseUsageMetadata): 
    return self.cached_content_token_count or 0

@patch(as_prop=True)
def inp(self: types.GenerateContentResponseUsageMetadata): 
    return (self.prompt_token_count - self.cached) or 0

@patch(as_prop=True)
def out(self: types.GenerateContentResponseUsageMetadata): 
    return self.candidates_token_count or 0

@patch(as_prop=True)
def total(self: types.GenerateContentResponseUsageMetadata): 
    return self.total_token_count or self.prompt_token_count + self.candidates_token_count

GenerateContentResponseUsageMetadata.out

 GenerateContentResponseUsageMetadata.out ()

GenerateContentResponseUsageMetadata.inp

 GenerateContentResponseUsageMetadata.inp ()

GenerateContentResponseUsageMetadata.cached

 GenerateContentResponseUsageMetadata.cached ()
u = usage(1, 2, 3)
u.inp, u.out, u.cached, u.total
(1, 2, 3, 6)

We patch a few properties to make dealing with the usage object a bit less verbose.


GenerateContentResponseUsageMetadata.__repr__

 GenerateContentResponseUsageMetadata.__repr__ ()
Exported source
@patch
def __repr__(self: types.GenerateContentResponseUsageMetadata):
    return f"Cached: {self.cached}; In: {self.inp}; Out: {self.out}; Total: {self.total}"

@patch
def _repr_markdown_(self: types.GenerateContentResponseUsageMetadata):
    return self.__repr__()
u

Cached: 3; In: 1; Out: 2; Total: 6

Finally, we make the string and markdown representation a bit more readable and coherent with the ones in claudette. Since we patched _repr_markdown_ in the BaseModel, we need to “unpatch” it here.

r
Yes, I am ready to work! What can I do for you?
  • usage_metadata: Cached: 0; In: 9; Out: 16; Total: 25
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.36470311880111694
    • content:
      • parts:
        parts[0]
        • text: Yes, I am ready to work! What can I do for you?
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:

GenerateContentResponseUsageMetadata.__add__

 GenerateContentResponseUsageMetadata.__add__ (other)
Exported source
@patch
def __add__(self: types.GenerateContentResponseUsageMetadata, other):
    cached = getattr(self, "cached", 0) + getattr(other, "cached", 0)
    return usage(self.inp + other.inp, self.out + other.out, cached)
usage(5, 1) + usage(32, 32, 32)

Cached: 32; In: 37; Out: 33; Total: 102

Pricings


source

get_pricing

 get_pricing (model, prompt_tokens)
Exported source
# $/1M input (non cached) tokens, $/1M output tokens, $/1M cached input tokens, 

pricings = {
    'gemini-2.0-flash': [0.1, 0.4, 0.025],
    'gemini-2.0-flash-lite': [0.075, 0.3, 0.01875],
    'gemini-1.5-flash_short': [0.075, 0.3, 0.01875],
    'gemini-1.5-flash_long': [0.15, 0.6, 0.0375], 
    'gemini-1.5-flash-8b_short': [0.0375, 0.15, 0.01],
    'gemini-1.5-flash-8b_long': [0.075, 0.3, 0.02],
    'gemini-1.5-pro_short': [1.25, 5., 0.3125],   
    'gemini-1.5-pro_long': [2.5, 10., 0.625],
 }


audio_token_pricings = {
    'gemini-2.0-flash': [0.7, 0.4, 0.175],
}

def get_pricing(model, prompt_tokens):
    if "-exp-" in model: return [0, 0, 0]
    suff = "_long" if prompt_tokens > 128_000 else "_short"
    m = all_model_types.get(model, "#").split("#")[-1]
    m += suff if "1.5" in m else ""
    return pricings.get(m, [0, 0, 0])

The pricing of Gemini model queries is quite byzantine, with the price of a query on Gemini 1.5 dependent on the prompt length, while for Gemini 2.0 models it depends on the input type.

A few things to notice:

  • The differential pricings for audio tokens for Gemini 2.0 Flash is not implemented (and the cost for audio tokens caching will be active starting Feb 24, 2025)
  • Caching costs are not only per query. There is an added cost of storing the cache, which is computed separately and depends on the cached content storage time, as well as the number of tokens

TODO: for the moment we are ignoring these nuances in the cost calculations, but we might want to do more precise computations at a later date (and maybe include cost of multimedia generation with Gemini 2, when it becomes available).

for m in models:
    print(m, "SHORT PROMPT", get_pricing(m , 1_000))
    print(m, "LONG PROMPT", get_pricing(m , 1_000_000))
gemini-2.0-flash SHORT PROMPT [0.1, 0.4, 0.025]
gemini-2.0-flash LONG PROMPT [0.1, 0.4, 0.025]
gemini-2.0-flash-001 SHORT PROMPT [0.1, 0.4, 0.025]
gemini-2.0-flash-001 LONG PROMPT [0.1, 0.4, 0.025]
gemini-2.0-pro-exp-02-05 SHORT PROMPT [0, 0, 0]
gemini-2.0-pro-exp-02-05 LONG PROMPT [0, 0, 0]
gemini-2.0-flash-lite-preview-02-05 SHORT PROMPT [0.075, 0.3, 0.01875]
gemini-2.0-flash-lite-preview-02-05 LONG PROMPT [0.075, 0.3, 0.01875]
gemini-1.5-flash SHORT PROMPT [0.075, 0.3, 0.01875]
gemini-1.5-flash LONG PROMPT [0.15, 0.6, 0.0375]
gemini-1.5-pro SHORT PROMPT [1.25, 5.0, 0.3125]
gemini-1.5-pro LONG PROMPT [2.5, 10.0, 0.625]
gemini-1.5-pro-002 SHORT PROMPT [1.25, 5.0, 0.3125]
gemini-1.5-pro-002 LONG PROMPT [2.5, 10.0, 0.625]
gemini-1.5-flash-8b SHORT PROMPT [0.0375, 0.15, 0.01]
gemini-1.5-flash-8b LONG PROMPT [0.075, 0.3, 0.02]
gemini-2.0-flash-thinking-exp-01-21 SHORT PROMPT [0, 0, 0]
gemini-2.0-flash-thinking-exp-01-21 LONG PROMPT [0, 0, 0]

GenerateContentResponse.cost

 GenerateContentResponse.cost ()
Exported source
@patch(as_prop=True)
def cost(self: types.GenerateContentResponse):
    ip, op, cp = get_pricing(self.model_version, self.usage_metadata.prompt_token_count)
    return ((self.usage_metadata.inp * ip) + (self.usage_metadata.out * op) + (self.usage_metadata.cached * cp)) / 1e6
r.cost
7.3e-06

GenerateImagesResponse.cost

 GenerateImagesResponse.cost ()
Exported source
@patch(as_prop=True)
def cost(self: types.GenerateImagesResponse): return 0.03 * len(self.generated_images)
mr.cost
0.06

There is some inconsistency in the pricing of Imagen models: according to this page it’s $0.03 per image generated, while according to the pricing page it’s $0.03 per million tokens (but there is no way of counting tokens on the image generated). Until clarified, we’ll stick with the former for simplicity.

Client

We actually don’t neeed to create a new client object from scratch. Thanks to fastcore we can actually extend the genai.Codels class to give it what we want. Namely:

  • Simple multimodal prompt handling
  • A simpler generation interface, coherent with claudette and cosette
  • Cost and usage tracking

We can the build on this to add capabilities for tool usage, multimodal outputs etc.

Creating messages

Messages sent to Gemini are made of a list of Parts, which can be text, or multimedia parts. Some multimedia files can be inlined as bytes (in particular images), but others need to be uploaded using the file API first. Although this is quite flexible, it’s a bit clunky, so we want to make it easier.


source

mk_part

 mk_part (inp:Union[str,pathlib.Path,google.genai.types.Part,google.genai.
          types.File,PIL.Image.Image],
          c:google.genai.client.Client|None=None)

Turns an input fragment into a multimedia Part to be sent to a Gemini model

Exported source
def mk_part(inp: Union[str, Path, types.Part, types.File, PIL.Image.Image], c: genai.Client|None=None):
    "Turns an input fragment into a multimedia `Part` to be sent to a Gemini model"
    api_client = c or genai.Client(api_key=os.environ["GEMINI_API_KEY"])
    if isinstance(inp, (types.Part, types.File, PIL.Image.Image)): return inp
    p_inp = Path(inp)
    if p_inp.exists():
        mt = mimetypes.guess_type(p_inp)[0]
        if mt.split("/")[0] == "image": return types.Part.from_bytes(data=p_inp.read_bytes(), mime_type=mt)
        return api_client.files.upload(file=p_inp)
    return types.Part.from_text(text=inp)

Notice that we cannot make mk_part a completely standalone function. Having access to the files API requires an a client. We could pass the genai.Client we have created before (or monkey patch mk_part into a class that has access to a client already), but for testing we create a new client each time.

mk_part("Hello World")
  • text: Hello World
pimg = mr.generated_images[0].image._pil_image
mk_part(pimg)

# This will take a bit of time, since the pdf needs to be uploaded
f = mk_part(Path("DeepSeek_R1.pdf"))
f
  • mime_type: application/pdf
  • uri: https://generativelanguage.googleapis.com/v1beta/files/npm1fry2anf6
  • name: files/npm1fry2anf6
  • expiration_time: 2025-02-16 22:23:14.562145+00:00
  • update_time: 2025-02-14 22:23:14.712394+00:00
  • sha256_hash: ZDEzNTQwZDY0MDA1ODY3YmNjYjZkMzljYWQ3NTU0NzQwYTJiYjZlOTc5NmU5YjQ2YWJjM2JhYTliNWI4OGZhZQ==
  • state: FileState.ACTIVE
  • size_bytes: 1326429
  • source: FileSource.UPLOADED
  • create_time: 2025-02-14 22:23:14.712394+00:00
# This should be instant
mk_part(f)
  • mime_type: application/pdf
  • uri: https://generativelanguage.googleapis.com/v1beta/files/npm1fry2anf6
  • name: files/npm1fry2anf6
  • expiration_time: 2025-02-16 22:23:14.562145+00:00
  • update_time: 2025-02-14 22:23:14.712394+00:00
  • sha256_hash: ZDEzNTQwZDY0MDA1ODY3YmNjYjZkMzljYWQ3NTU0NzQwYTJiYjZlOTc5NmU5YjQ2YWJjM2JhYTliNWI4OGZhZQ==
  • state: FileState.ACTIVE
  • size_bytes: 1326429
  • source: FileSource.UPLOADED
  • create_time: 2025-02-14 22:23:14.712394+00:00

TODO: Notice that we are not handling the case when the file is expired, but these should not be long term objects anyways.

mk_part("match.png")
  • inline_data:
    • mime_type: image/png
    • data: b’89PNG1a…’

Genai already handles PIL images, and Files input fragment. We could call explicitly the genai._transformers.t_part (it’s actually imported as genai.models.t.t_part) function here to make sure that mk_part always returns a types.Part, but

  • Why do the extra work?
  • Using something from a “private” module like _transformers is probably not a great idea, although technically this is exposed as genai.models.t

source

mk_parts

 mk_parts (inps, c=None)
Exported source
def is_texty(o): return isinstance(o, str) or (isinstance(o, types.Part) and bool(o.text))

def mk_parts(inps, c=None):
    cts = L(inps).map(mk_part, c=c) if inps else L(" ")
    return list(cts) if len(cts) > 1 or is_texty(cts[0]) else list(cts + [" "])

source

is_texty

 is_texty (o)

Gemini does not like empty inputs or messages with just a single media file, so in those cases we append a string with a single space to the content. We need to convert back from L to actual list because the former triggers the type checking of the api.

try: a = c.models.generate_content(model="gemini-2.0-flash", contents="")
except ValueError: b = c.models.generate_content(model="gemini-2.0-flash", contents=mk_parts(""))
b
Please provide me with more context! I need to know what you’d like me to do with your request. For example, are you asking me to:

* Write something? If so, what should I write about? What is the topic, the audience, and the desired tone?
* Summarize something? If so, please provide the text you want me to summarize.
* Answer a question? If so, what is the question?
* Generate code? If so, what language should I use and what should the code do?
* Translate something? If so, what language should I translate to and from, and what is the text to be translated?

The more information you give me, the better I can understand your request and provide a helpful response.
  • usage_metadata: Cached: 0; In: 1; Out: 174; Total: 175
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.20099690316737384
    • content:
      • parts:
        parts[0]
        • text: Please provide me with more context! I need to know what you’d like me to do with your request. For example, are you asking me to:

          • Write something? If so, what should I write about? What is the topic, the audience, and the desired tone?
          • Summarize something? If so, please provide the text you want me to summarize.
          • Answer a question? If so, what is the question?
          • Generate code? If so, what language should I use and what should the code do?
          • Translate something? If so, what language should I translate to and from, and what is the text to be translated?
          The more information you give me, the better I can understand your request and provide a helpful response.
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:
parts = mk_parts(["DeepSeek_R1.pdf", "What is this?"], c=c)

c.models.generate_content(model="gemini-2.0-flash", contents=parts)
This is a research paper titled “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” by DeepSeek-AI. The paper introduces their first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, and their training process using reinforcement learning. It also discusses distilling reasoning capabilities from DeepSeek-R1 to smaller dense models. The paper includes benchmark results for DeepSeek-R1 and its distilled versions, comparing them to other models like OpenAI’s models and open-source models.
  • usage_metadata: Cached: 0; In: 5680; Out: 115; Total: 5795
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.2893808779509171
    • content:
      • parts:
        parts[0]
        • text: This is a research paper titled “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” by DeepSeek-AI. The paper introduces their first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, and their training process using reinforcement learning. It also discusses distilling reasoning capabilities from DeepSeek-R1 to smaller dense models. The paper includes benchmark results for DeepSeek-R1 and its distilled versions, comparing them to other models like OpenAI’s models and open-source models.
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:

Generation interface

We start with a simple definition of __call__ on the client. Adding functionalities a bit at a time.

@patch
def __call__(self: genai.Client, inps, **kwargs):
    model = getattr(self, "model", None) or kwargs.get("model", None)
    parts = mk_parts(inps, self)
    return self.models.generate_content(model=model, contents=parts)
c(["match.png", "What is this?"], model=models[0])
This is a whimsical and surreal illustration depicting a boxing match between a feathered dinosaur-like creature and a Cocker Spaniel dog. They are in a boxing ring, wearing boxing gloves and surrounded by a crowd of various animals, mostly otters and dog-like creatures. There is a human boy and a Bigfoot-like creature supervising the match. The whole scene has a mosaic-like texture and is done in a detailed and illustrative style.
  • usage_metadata: Cached: 0; In: 1294; Out: 89; Total: 1383
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.5769802907879433
    • content:
      • parts:
        parts[0]
        • text: This is a whimsical and surreal illustration depicting a boxing match between a feathered dinosaur-like creature and a Cocker Spaniel dog. They are in a boxing ring, wearing boxing gloves and surrounded by a crowd of various animals, mostly otters and dog-like creatures. There is a human boy and a Bigfoot-like creature supervising the match. The whole scene has a mosaic-like texture and is done in a detailed and illustrative style.
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:

Cost and usage tracking


Client.cost

 Client.cost ()
Exported source
@patch(as_prop=True)
def use(self: genai.Client): return getattr(self, "_u", usage())

@patch(as_prop=True)
def cost(self: genai.Client): return getattr(self, "_cost", 0)

@patch
def _r(self: genai.Client, r):
    self.result = r
    self._u = self.use + getattr(r, "usage_metadata", usage())
    self._cost = self.cost + r.cost
    return r

Client.use

 Client.use ()

We call _r after each generation has been completed (both for Gemini and Imagen models) to keep track of token usage and costs as well as storing the latest response from the model.

c._r(mr)
c._r(r)

c.use, c.cost
(Cached: 0; In: 9; Out: 16; Total: 25, 0.0600073)
@patch
def __call__(self: genai.Client, inps, **kwargs):
    model = getattr(self, "model", None) or kwargs.get("model", None)
    parts = mk_parts(inps, self)
    r = self.models.generate_content(model=model, contents=parts)
    return self._r(r)
c("Write me a very long poem", model=models[0])

c.use

Cached: 0; In: 15; Out: 1609; Total: 1624

c.result
Okay, here’s a very long poem, or at least the start of one that could become very long. I’ll focus on creating a sense of vastness and mystery, leaving room for many different threads to be picked up and explored further. This is designed to be evocative and open-ended. Consider this just the first few “cantos” or sections. Let me know if you’d like me to continue a specific thread or explore a new one.

The Song of Echoing Sands

(Canto I: Genesis of Dust)

The wind, a sculptor with invisible hands,
Across the desert, writes and re-arranges sands.
A parchment boundless, sun-bleached, cracked, and old,
Where secrets sleep, and stories wait untold.

Before the pyramids, before the pharaoh’s might,
Before the first star shivered in the night,
There was the dust, a swirling, formless grace,
A canvas blank, in time’s unending space.

From shattered mountains, ground to finest grain,
By patient rivers, whispering of pain,
From ancient seas, evaporated and dried,
Where leviathans in silent slumber hide,

The dust arose, a testament to change,
A silent witness to life’s ebb and range.
It cradled seeds, and quenched the thirsty root,
And bore the footprints of the wandering loot.

The sun, a furnace in the azure dome,
Baked the land to bronze, a desolate home.
Yet even here, where life seems scarce and grim,
A hidden pulse beats strongly, on the desert’s limb.

A lizard, emerald bright, upon a stone,
A falcon circling, fiercely, all alone,
A scorpion, beneath a buried shard,
Life clings and battles, ever on its guard.

The wind remembers whispers of the past,
Of empires risen, destined not to last.
Of caravans that vanished in the haze,
And forgotten gods, of long-departed days.

It sings a mournful dirge, a lonely, haunting sound,
Across the dunes, where only silence is found.
A song of echoes, carried on the breeze,
A symphony of sorrow, in the whispering trees…
(Except there are no trees, only the memory of them, perhaps).

(Canto II: The City of Whispers)

The shifting sands conceal, and they reveal,
A city buried, secrets to conceal.
Its towers crumble, swallowed by the earth,
A forgotten kingdom, proving little worth.

The wind uncovers, then it covers deep,
The fragments of a dream, where phantoms sleep.
A mosaic shattered, beauty turned to waste,
A haunting reminder of time’s relentless haste.

Imagine walls, once painted vibrant hue,
Adorned with frescoes, stories fresh and new.
Of kings and queens, their opulent display,
Their loves and losses, vanished yesterday.

The marketplace, a hub of bustling trade,
Where spices mingled, fortunes were made.
The cries of vendors, echoing no more,
Just silence reigns, behind a crumbling door.

A temple stands, its columns cracked and worn,
Where ancient rituals, were fervently sworn.
The priests have vanished, prayers unanswered lie,
Beneath the gaze of an indifferent sky.

But something lingers, in the desert air,
A sense of presence, a burden to bear.
A whisper soft, carried on the breeze,
A memory stirring, among the silent trees…
(Still, the memory of trees. What did this land look like before the desert?)

(Canto III: The Oracle’s Dream)

Beneath the sand, in chambers dark and deep,
An oracle slumbers, lost in endless sleep.
Her mind a tapestry, of visions bright,
Of futures forming, shrouded in the night.

She dreams of serpents, coiled in golden rings,
Of winged creatures, soaring on the winds.
Of ancient symbols, etched in starlit stone,
Of destinies unfolding, all alone.

She sees the rise, and fall of mighty states,
The clash of armies, sealing tragic fates.
She feels the sorrow, of a broken heart,
And witnesses the moment, empires fall apart.

Her dreams are warnings, whispered on the air,
To those who listen, and who truly care.
But few can hear, above the desert’s hum,
The oracle’s message, destined to succumb.

For time is ruthless, and it marches on,
Oblivious to the battles fought and won.
The city sleeps, beneath the shifting sand,
A testament to folly, in a desolate land.

And the oracle dreams, on and on and on…
Her visions fading, with the setting sun.
A silent prophet, in a world unknown,
Her secrets buried, beneath the desert stone.

(Canto IV: The Wanderer’s Path)

Across the dunes, a solitary figure strides,
A wanderer lost, where nothing truly hides.
His face is weathered, etched with lines of care,
His eyes reflect the emptiness he’s there.

He carries little, just a tattered map,
And memories that fill a gaping trap.
He seeks a legend, whispered in the breeze,
A hidden oasis, beneath the dying trees…
(Ah, the trees again! He’s searching for something he believes to be real, a memory of green amidst the brown).

He’s heard the stories, of a hidden spring,
Where water flows, and birds begin to sing.
A sanctuary found, in this forsaken place,
A glimmer of hope, in time and endless space.

But doubts assail him, as the days grow long,
And thirst and hunger, make him weak and wrong.
Is it a mirage, a cruel and empty lie?
Or does salvation, truly wait nearby?

He stumbles onward, driven by despair,
His weary spirit, burdened by its share.
He scans the horizon, searching for a sign,
A glimmer of green, a promise to align.

Perhaps he’ll find it, or perhaps he’ll fall,
A silent victim, answering desert’s call.
The wind will bury, footprints in the sand,
And leave him nameless, in this barren land.

But even in death, a story will remain,
A whisper carried, on the wind’s refrain.
Of hope that flickered, then began to fade,
A wanderer’s journey, tragically betrayed.

(To be continued…)

This is just the beginning. Where should we go next? Here are a few options, or suggest your own:

* Explore the Wanderer’s backstory: Who is he? What drove him to the desert? What is he running from?
* Delve deeper into the Oracle’s visions: What specific events does she foresee? Are there ways to change them?
* Uncover more of the buried city’s history: What caused its downfall? Were there survivors?
* Introduce a new character: Perhaps a scavenger, a merchant, or another traveler seeking something in the desert.
* Focus on the ecology of the desert: Explore the plants, animals, and unique adaptations that allow life to thrive in this harsh environment.
* Expand on the mythology of the region: Are there local legends, folk tales, or beliefs that shape the lives of those who live there?

Let me know what intrigues you the most, and I’ll continue the poem!
  • usage_metadata: Cached: 0; In: 6; Out: 1593; Total: 1599
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.5863409527277542
    • content:
      • parts:
        parts[0]
        • text: Okay, here’s a very long poem, or at least the start of one that could become very long. I’ll focus on creating a sense of vastness and mystery, leaving room for many different threads to be picked up and explored further. This is designed to be evocative and open-ended. Consider this just the first few “cantos” or sections. Let me know if you’d like me to continue a specific thread or explore a new one.

          The Song of Echoing Sands

          (Canto I: Genesis of Dust)

          The wind, a sculptor with invisible hands, Across the desert, writes and re-arranges sands. A parchment boundless, sun-bleached, cracked, and old, Where secrets sleep, and stories wait untold.

          Before the pyramids, before the pharaoh’s might, Before the first star shivered in the night, There was the dust, a swirling, formless grace, A canvas blank, in time’s unending space.

          From shattered mountains, ground to finest grain, By patient rivers, whispering of pain, From ancient seas, evaporated and dried, Where leviathans in silent slumber hide,

          The dust arose, a testament to change, A silent witness to life’s ebb and range. It cradled seeds, and quenched the thirsty root, And bore the footprints of the wandering loot.

          The sun, a furnace in the azure dome, Baked the land to bronze, a desolate home. Yet even here, where life seems scarce and grim, A hidden pulse beats strongly, on the desert’s limb.

          A lizard, emerald bright, upon a stone, A falcon circling, fiercely, all alone, A scorpion, beneath a buried shard, Life clings and battles, ever on its guard.

          The wind remembers whispers of the past, Of empires risen, destined not to last. Of caravans that vanished in the haze, And forgotten gods, of long-departed days.

          It sings a mournful dirge, a lonely, haunting sound, Across the dunes, where only silence is found. A song of echoes, carried on the breeze, A symphony of sorrow, in the whispering trees… (Except there are no trees, only the memory of them, perhaps).

          (Canto II: The City of Whispers)

          The shifting sands conceal, and they reveal, A city buried, secrets to conceal. Its towers crumble, swallowed by the earth, A forgotten kingdom, proving little worth.

          The wind uncovers, then it covers deep, The fragments of a dream, where phantoms sleep. A mosaic shattered, beauty turned to waste, A haunting reminder of time’s relentless haste.

          Imagine walls, once painted vibrant hue, Adorned with frescoes, stories fresh and new. Of kings and queens, their opulent display, Their loves and losses, vanished yesterday.

          The marketplace, a hub of bustling trade, Where spices mingled, fortunes were made. The cries of vendors, echoing no more, Just silence reigns, behind a crumbling door.

          A temple stands, its columns cracked and worn, Where ancient rituals, were fervently sworn. The priests have vanished, prayers unanswered lie, Beneath the gaze of an indifferent sky.

          But something lingers, in the desert air, A sense of presence, a burden to bear. A whisper soft, carried on the breeze, A memory stirring, among the silent trees… (Still, the memory of trees. What did this land look like before the desert?)

          (Canto III: The Oracle’s Dream)

          Beneath the sand, in chambers dark and deep, An oracle slumbers, lost in endless sleep. Her mind a tapestry, of visions bright, Of futures forming, shrouded in the night.

          She dreams of serpents, coiled in golden rings, Of winged creatures, soaring on the winds. Of ancient symbols, etched in starlit stone, Of destinies unfolding, all alone.

          She sees the rise, and fall of mighty states, The clash of armies, sealing tragic fates. She feels the sorrow, of a broken heart, And witnesses the moment, empires fall apart.

          Her dreams are warnings, whispered on the air, To those who listen, and who truly care. But few can hear, above the desert’s hum, The oracle’s message, destined to succumb.

          For time is ruthless, and it marches on, Oblivious to the battles fought and won. The city sleeps, beneath the shifting sand, A testament to folly, in a desolate land.

          And the oracle dreams, on and on and on… Her visions fading, with the setting sun. A silent prophet, in a world unknown, Her secrets buried, beneath the desert stone.

          (Canto IV: The Wanderer’s Path)

          Across the dunes, a solitary figure strides, A wanderer lost, where nothing truly hides. His face is weathered, etched with lines of care, His eyes reflect the emptiness he’s there.

          He carries little, just a tattered map, And memories that fill a gaping trap. He seeks a legend, whispered in the breeze, A hidden oasis, beneath the dying trees… (Ah, the trees again! He’s searching for something he believes to be real, a memory of green amidst the brown).

          He’s heard the stories, of a hidden spring, Where water flows, and birds begin to sing. A sanctuary found, in this forsaken place, A glimmer of hope, in time and endless space.

          But doubts assail him, as the days grow long, And thirst and hunger, make him weak and wrong. Is it a mirage, a cruel and empty lie? Or does salvation, truly wait nearby?

          He stumbles onward, driven by despair, His weary spirit, burdened by its share. He scans the horizon, searching for a sign, A glimmer of green, a promise to align.

          Perhaps he’ll find it, or perhaps he’ll fall, A silent victim, answering desert’s call. The wind will bury, footprints in the sand, And leave him nameless, in this barren land.

          But even in death, a story will remain, A whisper carried, on the wind’s refrain. Of hope that flickered, then began to fade, A wanderer’s journey, tragically betrayed.

          (To be continued…)

          This is just the beginning. Where should we go next? Here are a few options, or suggest your own:

          • Explore the Wanderer’s backstory: Who is he? What drove him to the desert? What is he running from?
          • Delve deeper into the Oracle’s visions: What specific events does she foresee? Are there ways to change them?
          • Uncover more of the buried city’s history: What caused its downfall? Were there survivors?
          • Introduce a new character: Perhaps a scavenger, a merchant, or another traveler seeking something in the desert.
          • Focus on the ecology of the desert: Explore the plants, animals, and unique adaptations that allow life to thrive in this harsh environment.
          • Expand on the mythology of the region: Are there local legends, folk tales, or beliefs that shape the lives of those who live there?
          Let me know what intrigues you the most, and I’ll continue the poem!
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:

Streaming generation

Exported source
@patch(as_prop=True)
def _parts(self: types.GenerateContentResponse): return nested_idx(self, "candidates", 0, "content", "parts") or []
    

@patch
def _stream(self: genai.Client, s):
    all_parts = []
    for r in s:
        all_parts.extend(r._parts)
        yield r.text
    r.candidates[0].content.parts = all_parts
    self._r(r)

To keep the behaviour coherent with Claudette’s when in streaming mode, we should only yield the text of the chunks, rather than the full response. The _stream method essentially replicates the text_stream of Anthropic’s SDK. Since there is no to Claude’s get_final_message we have to store the chunk parts as they are yielded. After the stream is exhausted, we substitute the saved parts into the final response (which contains the correct usage as well, and pass the result through _r)

@patch
def __call__(self: genai.Client, inps, stream=False, **kwargs):
    parts = mk_parts(inps, self)
    model = getattr(self, "model", None) or kwargs.get("model", None)
    parts = mk_parts(inps, self)
    gen_f = self.models.generate_content_stream if stream else self.models.generate_content
    r = gen_f(model=model, contents=parts)
    return self._stream(r) if stream else self._r(r)
c("Write me a short poem", model="gemini-2.0-flash")
The sun dips low, a fiery kiss,
Upon the clouds, a golden bliss.
The day sighs soft, a gentle breeze,
Rustling through the ancient trees.

And shadows stretch, long and deep,
As weary world prepares to sleep.
  • usage_metadata: Cached: 0; In: 5; Out: 54; Total: 59
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.35604388625533495
    • content:
      • parts:
        parts[0]
        • text: The sun dips low, a fiery kiss, Upon the clouds, a golden bliss. The day sighs soft, a gentle breeze, Rustling through the ancient trees.

          And shadows stretch, long and deep, As weary world prepares to sleep.
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:
for chunk in c("Write me a short poem", model="gemini-2.0-flash", stream=True):
    print(chunk, end="")
The sun dips low, a fiery kiss,
Upon the hills, a gentle bliss.
The shadows lengthen, cool and deep,
While weary world begins to sleep.

A single star begins to gleam,
A silent promise in a dream.
And in the quiet, hope remains,
To wash away the earthly stains.
c.result
The sun dips low, a fiery kiss,
Upon the hills, a gentle bliss.
The shadows lengthen, cool and deep,
While weary world begins to sleep.

A single star begins to gleam,
A silent promise in a dream.
And in the quiet, hope remains,
To wash away the earthly stains.
  • usage_metadata: Cached: 0; In: 5; Out: 69; Total: 74
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • content:
      • parts:
        parts[0]
        • text: The
        parts[1]
        • text: sun dips low, a fiery kiss, Upon the hills, a gentle bliss
        parts[2]
        • text: . The shadows lengthen, cool and deep, While weary world begins to sleep.

        parts[3]
        • text: A single star begins to gleam, A silent promise in a dream. And in the quiet, hope remains, To wash away the earthly stains
        parts[4]
        • text: .
      • role: model
    • finish_reason: FinishReason.STOP

Tool use

There are two ways of managing function calling. The other is to actually build the function declaration. The first approach has the advantage of enabling automatic function calling, meaning that whenever the LLM decides that the function needs to be called, it will call and get the result back. The main drawback is that it realies on the types.FunctionDeclaration.from_callable method, which is quite limited (mainly, it does not add descriptions to the parameters, most likely relying on docstrings following Google’s style guide).

The second approach requires manually declaring the function, but this won’t be picked up by the automatic function calling, so requires manually enabling the function loop (extracting the function calls from the response, calling the function and passing it back to the LLM).

def add1(
    a:int, # the 1st number to add
    b=0,   # the 2nd number to add
)->int:    # the result of adding `a` to `b`
    "Sums two numbers."
    return a+b

def add2(
    a:int, # the 1st number to add
    b:int=0,   # the 2nd number to add
)->int:    # the result of adding `a` to `b`
    "Sums two numbers."
    return a+b

def add3(
    a:int, # the 1st number to add
    b:int,   # the 2nd number to add
)->int:    # the result of adding `a` to `b`
    "Sums two numbers."
    return a+b

try:
    f_decl = types.FunctionDeclaration.from_callable(callable=add1, client=c)
except:
    try:
        f_decl = types.FunctionDeclaration.from_callable(callable=add2, client=c)
    except:
        f_decl = types.FunctionDeclaration.from_callable(callable=add3, client=c)
        

f_decl.to_json_dict()
{'description': 'Sums two numbers.',
 'name': 'add3',
 'parameters': {'type': 'OBJECT',
  'properties': {'a': {'type': 'INTEGER'}, 'b': {'type': 'INTEGER'}}}}
docments(add1, full=True, returns=False)
{ 'a': { 'anno': <class 'int'>,
         'default': <class 'inspect._empty'>,
         'docment': 'the 1st number to add'},
  'b': { 'anno': <class 'int'>,
         'default': 0,
         'docment': 'the 2nd number to add'}}

Notice that types.FunctionDeclaration.from_callable:

  1. Cannot infer the parameter type from the default value (while docments can)
  2. Does not use the default values at all: in fact adding default values to a function declaration passed to the Gemini API will raise an error, although the LLM would be able to use (and in fact does use if it’s passed), the “required” field of the “parameters”, which again could be inferred.

source

goog_doc

 goog_doc (f:<built-infunctioncallable>)

Builds the docstring for a docment style function following Google style guide

Type Details
f callable A docment style function
Returns str Google style docstring

When passing a function as a tool, the API will only access the function signature and docstring to build the FunctionDeclaration, so unless they are part of the docstring, the LLM has no way of knowing what the arguments or the returned value are. Assuming that Gemini models will have seen quite a lot of google code, and considering the examples in the documentation, it’s probably a good idea to turn the docstrings of tools into a format compatible with the style guide.

print(goog_doc(goog_doc))
Builds the docstring for a docment style function following Google style guide

Args:
    f: A docment style function

Returns:
    Google style docstring

source

prep_tool

 prep_tool (f:<built-infunctioncallable>, as_decl:bool=False,
            googlify_docstring:bool=True)

Optimizes for function calling with the Gemini api. Best suited for docments style functions.

Type Default Details
f callable The function to be passed to the LLM
as_decl bool False Return an enriched genai.types.FunctionDeclaration?
googlify_docstring bool True Use docments to rewrite the docstring following Google Style Guide?
Exported source
def _geminify(f: callable) -> callable:
    """Makes a function suitable to be turned into a function declaration: 
    infers argument types from default values and removes the values from the signature"""
    docs = docments(f, full=True)
    new_params = [inspect.Parameter(name=n,
                                    kind=inspect.Parameter.POSITIONAL_OR_KEYWORD,
                                    annotation=i.anno) for n, i in docs.items() if n != 'return']
    @wraps(f)
    def wrapper(*args, **kwargs):
        return f(*args, **kwargs)
    
    wrapper.__signature__ = inspect.Signature(new_params, return_annotation=docs['return']['anno'])
    wrapper.__annotations__ = {n: i['anno'] for n, i in docs.items() if n != 'return'}
    return wrapper


def prep_tool(f:callable, # The function to be passed to the LLM
             as_decl:bool=False,  # Return an enriched genai.types.FunctionDeclaration?
             googlify_docstring:bool=True): # Use docments to rewrite the docstring following Google Style Guide? 
    """Optimizes for function calling with the Gemini api. Best suited for docments style functions."""
    _f = _geminify(f)
    if googlify_docstring: _f.__doc__ = goog_doc(_f)
    if not as_decl: return _f
    f_decl = types.FunctionDeclaration.from_callable_with_api_option(callable=_f, api_option='GEMINI_API')
    for par, desc in docments(_f, returns=False).items():
        if desc: f_decl.parameters.properties[par].description = desc
    required_params = [p for p, d in docments(f, full=True, returns=False).items() if d['default'] == inspect._empty]
    f_decl.parameters.required = required_params
    return f_decl

To prepare a function to be used as a function declaration, it needs to be stripped of default values and all the arguments need to be annotated. Turning the docstrings in a Google compatible format makes sure that the result can be used for automatic function calling. We rely on FunctionDeclaration.from_callable either implicitly (when passing the prepped function to the LLM) or implicitly to do the necessary type conversions of the annotations (i.e. turning a float into NUMBER, a str into STRING etc.). If building the function declaration explicitly, we can also enrich it with information from the original function (namely the presence of default values and the arguments docments that can be added paramters objects).

x = prep_tool(add1, as_decl=True)
x
  • parameters:
    • required:
      required[0] a
    • properties:
      • a:
        • description: the 1st number to add
        • type: INTEGER
      • b:
        • description: the 2nd number to add
        • type: INTEGER
    • type: Type.OBJECT
  • description: Sums two numbers.

    Args: a: the 1st number to add b: the 2nd number to add

    Returns: the result of adding a to b
  • name: add1
respt.candidates[0].content.parts[0].to_json_dict()
{'function_call': {'args': {'a': 604542, 'b': 6458932}, 'name': 'add1'}}
@patch
def __call__(self: genai.Client, inps=None, stream=False, tools=None, **kwargs):
    config=dict()
    if tools: config['tools'] = [prep_tool(f) for f in tools if callable(f)] + [t for t in tools if isinstance(t, types.Tool)]
    parts = mk_parts(inps, self)
    self.query = types.Content(parts=parts, role="user")
    model = getattr(self, "model", None) or kwargs.get("model", None)
    parts = mk_parts(inps, self)
    gen_f = self.models.generate_content_stream if stream else self.models.generate_content
    r = gen_f(model=model, contents=parts, config=config if config else None)
    return self._stream(r) if stream else self._r(r)
globals()['add1'](**respt.function_calls[0].args)
7153531
tool = types.Tool(functionDeclarations=[x])

a,b = 694599,6458932
pr = f"What is {a}+{b}?"


respt = c(pr, model=model, tools=[tool])
respt
def sums(
    a:int,  # First number to sum 
    b=1 # Second number to sum
) -> int: # The sum of the inputs
    "Adds two numbers"
    print(f"Finding the sum of {a} and {b}")
    return a + b

a,b = 604542,6458932
pr = f"What is {a}+{b}?"
pr
'What is 604542+6458932?'
c(pr, model=model, tools=[sums])
Finding the sum of 604542 and 6458932
The sum of 604542 and 6458932 is 7063474.
  • usage_metadata: Cached: 0; In: 58; Out: 30; Total: 88
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -8.514967436591784e-05
    • content:
      • parts:
        parts[0]
        • text: The sum of 604542 and 6458932 is 7063474.
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:
    automatic_function_calling_history[0]
    • parts:
      parts[0]
      • text: What is 604542+6458932?
    • role: user
    automatic_function_calling_history[1]
    • parts:
      parts[0]
      • function_call:
        • args:
          • b: 6458932
          • a: 604542
        • name: sums
    • role: model
    automatic_function_calling_history[2]
    • parts:
      parts[0]
      • function_response:
        • response:
          • result: 7063474
        • name: sums
    • role: user
def mults(
    a:int,  # First thing to multiply
    b:int=1 # Second thing to multiply
) -> int: # The product of the inputs
    "Multiplies a * b."
    print(f"Finding the product of {a} and {b}")
    return a * b

pr = f'Calculate ({a}+{b})*2'
pr
'Calculate (604542+6458932)*2'
c(pr, model=model, tools=[sums, mults])
Finding the sum of 604542 and 6458932
Finding the product of 7063474 and 2
(604542+6458932)*2 = 14126948
  • usage_metadata: Cached: 0; In: 105; Out: 28; Total: 133
  • model_version: gemini-2.0-flash
  • candidates:
    candidates[0]
    • avg_logprobs: -0.0003874258005193302
    • content:
      • parts:
        parts[0]
        • text: (604542+6458932)*2 = 14126948
      • role: model
    • finish_reason: FinishReason.STOP
  • automatic_function_calling_history:
    automatic_function_calling_history[0]
    • parts:
      parts[0]
      • text: Calculate (604542+6458932)*2
    • role: user
    automatic_function_calling_history[1]
    • parts:
      parts[0]
      • function_call:
        • args:
          • b: 6458932
          • a: 604542
        • name: sums
    • role: model
    automatic_function_calling_history[2]
    • parts:
      parts[0]
      • function_response:
        • response:
          • result: 7063474
        • name: sums
    • role: user
    automatic_function_calling_history[3]
    • parts:
      parts[0]
      • function_call:
        • args:
          • b: 2
          • a: 7063474
        • name: mults
      parts[1]
      • text:
    • role: model
    automatic_function_calling_history[4]
    • parts:
      parts[0]
      • function_response:
        • response:
          • result: 14126948
        • name: mults
    • role: user